Conversation
Toggle-able thinking mode from settings Fixed race issue in model loading/cancelling UTF-8 parser now matches llamacpp one Fixed error in vlm where camera stayed on even when you exit that section Default sys prompt when loading
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds end-to-end "thinking" support: model metadata/registration changes, persisted thinking-mode settings and UI, extraction/handling of embedded content across generation and voice flows, streaming cancellation and UTF‑8 robustness in native backends, plus related UI, tooling, and utility updates. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ChatUI as Chat Interface
participant Settings as SettingsViewModel
participant LLMVM as LLMViewModel
participant LLMSDK as RunAnywhere LLM SDK
participant CppBackend as C++ Backend
User->>ChatUI: Toggle Thinking Mode
ChatUI->>Settings: set thinkingModeEnabled
Settings->>Settings: persist
User->>ChatUI: Send prompt
ChatUI->>LLMVM: startGeneration(prompt)
LLMVM->>Settings: read thinkingModeEnabled
LLMVM->>LLMVM: check loadedModelSupportsThinking
alt thinking enabled & supported
LLMVM->>LLMSDK: generate(with thinking prefix)
else
LLMVM->>LLMSDK: generate(original prompt)
end
LLMSDK->>CppBackend: call C++ generate (streaming)
CppBackend-->>LLMSDK: stream tokens (may include <think> tags)
LLMSDK->>LLMVM: LLMGenerationResult(text, thinkingContent)
LLMVM->>ChatUI: update message with stripped display text
sequenceDiagram
participant User
participant VoiceUI as Voice Assistant
participant VoiceVM as VoiceAgentViewModel
participant VoiceSession as Voice Session (SDK)
participant LLMSDK as RunAnywhere LLM SDK
participant TTSSDK as TTS SDK
User->>VoiceUI: Tap mic
VoiceUI->>VoiceVM: micAction()
alt currently speaking
VoiceVM->>VoiceSession: interruptPlayback()
else
VoiceVM->>VoiceSession: startConversation()
end
VoiceSession->>LLMSDK: generate(effectivePrompt considering /no_think)
LLMSDK-->>VoiceSession: result (text, thinkingContent)
VoiceSession->>TTSSDK: synthesize(cleanedResponse)
TTSSDK-->>VoiceSession: audio
VoiceSession->>VoiceUI: play audio / emit events with thinkingContent
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| if !config.thinkingModeEnabled { | ||
| effectivePrompt = "/no_think\n\(transcription)" | ||
| } else { | ||
| effectivePrompt = transcription | ||
| } |
There was a problem hiding this comment.
/no_think prefix added for non-thinking models
VoiceSessionConfig.thinkingModeEnabled is set to settings.loadedModelSupportsThinking && settings.thinkingModeEnabled in VoiceAgentViewModel. This collapses two independent flags into one boolean, so the session can no longer distinguish "thinking mode disabled for a model that supports thinking" from "model does not support thinking at all."
As a result, every model that doesn't have thinking mode on (including Llama, Phi, etc. that never support thinking) will have /no_think\n prepended to every single voice prompt. This is inconsistent with how LLMViewModel.applyThinkingModePrefix handles the same scenario:
// LLMViewModel — correct: skips prefix entirely for non-thinking models
guard loadedModelSupportsThinking else { return prompt }The fix is to add a separate modelSupportsThinking: Bool field to VoiceSessionConfig and mirror the guard pattern used in LLMViewModel:
if config.modelSupportsThinking && !config.thinkingModeEnabled {
effectivePrompt = "/no_think\n\(transcription)"
} else {
effectivePrompt = transcription
}Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceSession.swift
Line: 225-229
Comment:
**`/no_think` prefix added for non-thinking models**
`VoiceSessionConfig.thinkingModeEnabled` is set to `settings.loadedModelSupportsThinking && settings.thinkingModeEnabled` in `VoiceAgentViewModel`. This collapses two independent flags into one boolean, so the session can no longer distinguish "thinking mode disabled for a model that _supports_ thinking" from "model does not support thinking at all."
As a result, every model that doesn't have thinking mode on (including Llama, Phi, etc. that never support thinking) will have `/no_think\n` prepended to every single voice prompt. This is inconsistent with how `LLMViewModel.applyThinkingModePrefix` handles the same scenario:
```swift
// LLMViewModel — correct: skips prefix entirely for non-thinking models
guard loadedModelSupportsThinking else { return prompt }
```
The fix is to add a separate `modelSupportsThinking: Bool` field to `VoiceSessionConfig` and mirror the guard pattern used in `LLMViewModel`:
```swift
if config.modelSupportsThinking && !config.thinkingModeEnabled {
effectivePrompt = "/no_think\n\(transcription)"
} else {
effectivePrompt = transcription
}
```
How can I resolve this? If you propose a fix, please make it concise.| Task { | ||
| await LLM.shared.destroy() | ||
| await STT.shared.destroy() | ||
| await TTS.shared.destroy() | ||
| await VAD.shared.destroy() | ||
| await VoiceAgent.shared.destroy() | ||
| await VLM.shared.destroy() | ||
| } |
There was a problem hiding this comment.
Swift destroy() tasks race with C shutdown
The Task { await … .destroy() } is fired-and-forgotten — it is not awaited before the synchronous C-level shutdown (Telemetry.shutdown(), Events.unregister(), etc.) continues. If the destroy() implementations reach into the C layer (e.g. to release handles), they will do so concurrently with or even after the underlying C objects have already been freed, creating a use-after-free.
Either await all destroy calls before proceeding, or ensure the C shutdown is run at the end of the same task:
Task {
await LLM.shared.destroy()
await STT.shared.destroy()
await TTS.shared.destroy()
await VAD.shared.destroy()
await VoiceAgent.shared.destroy()
await VLM.shared.destroy()
// Now safe to run C-level shutdown
Telemetry.shutdown()
Events.unregister()
...
}Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/CppBridge.swift
Line: 182-189
Comment:
**Swift `destroy()` tasks race with C shutdown**
The `Task { await … .destroy() }` is fired-and-forgotten — it is not awaited before the synchronous C-level shutdown (`Telemetry.shutdown()`, `Events.unregister()`, etc.) continues. If the `destroy()` implementations reach into the C layer (e.g. to release handles), they will do so concurrently with or even after the underlying C objects have already been freed, creating a use-after-free.
Either `await` all destroy calls before proceeding, or ensure the C shutdown is run at the end of the same task:
```swift
Task {
await LLM.shared.destroy()
await STT.shared.destroy()
await TTS.shared.destroy()
await VAD.shared.destroy()
await VoiceAgent.shared.destroy()
await VLM.shared.destroy()
// Now safe to run C-level shutdown
Telemetry.shutdown()
Events.unregister()
...
}
```
How can I resolve this? If you propose a fix, please make it concise.| thinkingTokens: thinkingContent.map { _ in outputTokens } ?? 0, | ||
| responseTokens: outputTokens |
There was a problem hiding this comment.
thinkingTokens incorrectly set to total output tokens
thinkingContent.map { _ in outputTokens } sets thinkingTokens to the entire outputTokens value whenever thinking content is present. But outputTokens already represents the total completion tokens (thinking + response combined). Setting thinkingTokens = outputTokens and responseTokens = outputTokens means both fields report the same total, which double-counts every token and inflates any downstream metrics that sum the two.
The thinking token count cannot be known exactly without tracking it during generation (e.g. counting the tokens inside the <think> block), but at a minimum the value should not exceed outputTokens. Consider leaving it as 0 until a reliable count is available, or computing a rough estimate from thinkingContent length:
thinkingTokens: thinkingContent.map { _ in 0 } ?? 0,Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+TextGeneration.swift
Line: 110-111
Comment:
**`thinkingTokens` incorrectly set to total output tokens**
`thinkingContent.map { _ in outputTokens }` sets `thinkingTokens` to the entire `outputTokens` value whenever thinking content is present. But `outputTokens` already represents the _total_ completion tokens (thinking + response combined). Setting `thinkingTokens = outputTokens` and `responseTokens = outputTokens` means both fields report the same total, which double-counts every token and inflates any downstream metrics that sum the two.
The thinking token count cannot be known exactly without tracking it during generation (e.g. counting the tokens inside the `<think>` block), but at a minimum the value should not exceed `outputTokens`. Consider leaving it as `0` until a reliable count is available, or computing a rough estimate from `thinkingContent` length:
```swift
thinkingTokens: thinkingContent.map { _ in 0 } ?? 0,
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| emit(.responded(text: cleanedResponse, thinkingContent: thinkingContent)) | ||
|
|
||
| // Step 4: Synthesize speech from cleaned response (no think tags spoken) |
There was a problem hiding this comment.
Missing Step 3 in numbered comment sequence
The inline comments jump from Step 2 to Step 4 with no Step 3, which is confusing for anyone reading through the pipeline:
// Step 1: Transcribe audio
// Step 2: Generate LLM response
// Step 4: Synthesize speech ← should be Step 3
| // Step 4: Synthesize speech from cleaned response (no think tags spoken) | |
| // Step 3: Synthesize speech from cleaned response (no think tags spoken) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceSession.swift
Line: 239
Comment:
**Missing Step 3 in numbered comment sequence**
The inline comments jump from Step 2 to Step 4 with no Step 3, which is confusing for anyone reading through the pipeline:
```
// Step 1: Transcribe audio
// Step 2: Generate LLM response
// Step 4: Synthesize speech ← should be Step 3
```
```suggestion
// Step 3: Synthesize speech from cleaned response (no think tags spoken)
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+ToolCalling.swift (1)
208-214:⚠️ Potential issue | 🟡 Minor
buildFollowUpPromptloses/no_thinkprefix positioning in follow-up iterations.The initial prompt correctly positions
/no_think\nat the beginning (lines 179-181). However,buildFollowUpPromptreceives the originalpromptparameter and embeds it in the middle of a larger string:"User: \(prompt)" // becomes: "User: /no_think\noriginal question..."If the C++ inference layer requires
/no_thinkat the prompt start, tool-calling loops after the first iteration will not suppress thinking correctly.Fix: Extract and reapply the
/no_thinkprefix inbuildFollowUpPrompt, or passcleanPromptto the function instead ofprompt.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+ToolCalling.swift around lines 208 - 214, The follow-up prompt builder (buildFollowUpPrompt) is embedding the original prompt which may contain the `/no_think` prefix in the middle of the assembled message, so the C++ layer loses the required leading `/no_think`. Fix by ensuring the leading `/no_think` is preserved: either strip and capture the `/no_think` prefix from prompt into cleanPrompt before calling buildFollowUpPrompt and pass cleanPrompt instead of prompt, or modify buildFollowUpPrompt to detect and reapply the `/no_think` prefix at the very start of the returned string; update the call that sets fullPrompt to pass cleanPrompt (or reapply prefix) and keep references to buildFollowUpPrompt, fullPrompt, prompt, and cleanPrompt consistent.examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Views/ChatInterfaceView.swift (1)
417-421:⚠️ Potential issue | 🟡 MinorKeep a stable scroll target while streaming.
After the first visible token arrives, the
"typing"view disappears, butscrollToBottom(proxy:animated:)later in this file still targets"typing"wheneverisGeneratingis true. That leaves focus/keyboard-triggered scrolls pointing at an ID that no longer exists mid-generation. Use the last message ID once content is non-empty, or keep a persistent bottom anchor.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Views/ChatInterfaceView.swift` around lines 417 - 421, The scrolling target "typing" disappears once the first token arrives, so change scroll target logic used by scrollToBottom(proxy:animated:) to choose a stable anchor: when viewModel.isGenerating is true but viewModel.messages.last?.content.isEmpty == false, use the last message's id (e.g., viewModel.messages.last?.id) instead of "typing", or create a persistent bottom anchor view ID (e.g., "bottom-anchor") and always scroll to that; update the TypingIndicatorView id usage (id("typing")) and the code that calls scrollToBottom(proxy:animated:) to select the last message ID when content is non-empty or fall back to the persistent "bottom-anchor" so scroll targets never point to a removed view.
🧹 Nitpick comments (6)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Models/ModelListViewModel.swift (1)
136-136: Consider providing feedback when a load is already in progress.The silent early return prevents race conditions, but users may be confused if they tap a model and nothing visibly happens. Consider either:
- Setting
isLoading = trueat the start ofselectModel(similar toloadModelsFromRegistry), so the UI can show a loading indicator.- Logging or providing minimal feedback that a load is already in progress.
This is minor since the UI may already handle this elsewhere.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Models/ModelListViewModel.swift` at line 136, The early return in selectModel guarded by isLoadingModel makes taps silently no-op; update selectModel (in ModelListViewModel) to give feedback by either setting the same loading state used in loadModelsFromRegistry (e.g., set isLoading = true / isLoadingModel = true at the start of selectModel) so the UI shows a spinner, or log/emit a minimal message when isLoadingModel is true to indicate a load is in progress; ensure you clear the loading flag on completion/failure just like loadModelsFromRegistry to avoid leaving the view stuck.sdk/runanywhere-commons/src/features/llm/llm_component.cpp (1)
772-788: Lock-free cancel is intentional but has a subtle race window.The design correctly avoids taking the mutex so that
cancelcan interrupt an in-flight generation that holds the lock. However, accessingcomponent->lifecycle(line 780) without the mutex could race withrac_llm_component_destroyorrac_llm_component_unloadif called concurrently.This is likely acceptable since calling
cancel()during component destruction would be a usage error, but consider documenting this constraint in the header or adding an assertion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp` around lines 772 - 788, The cancel implementation reads component->lifecycle without holding the component mutex which can race with destruction/unload; to fix, either document the threading constraint in the public header (declare that rac_llm_component_cancel must not be called concurrently with rac_llm_component_destroy or rac_llm_component_unload) or add a defensive check/assert at the top of rac_llm_component_cancel (e.g., assert(component->lifecycle && "cannot cancel during destroy/unload") or return an error if lifecycle is null) before calling rac_lifecycle_get_service and rac_llm_cancel, referencing rac_llm_component_cancel, component->lifecycle, rac_llm_component_destroy, rac_llm_component_unload and rac_llm_cancel.sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp (1)
983-1041: Consider extracting duplicated UTF-8 handling logic.The UTF-8 validation loop,
scanner_stateusage, and safe-stop trimming are nearly identical togenerate_stream(lines 682-752). Additionally,STOP_SEQUENCESandMAX_STOP_LENare defined twice (lines 665-674 and 966-975).Extracting this into a helper class or shared method would reduce duplication and ensure both code paths stay synchronized.
♻️ Suggested approach
Consider extracting to a file-level helper or inner class:
namespace { static const std::vector<std::string> kStopSequences = { "<|im_end|>", "<|eot_id|>", "</s>", "<|end|>", "<|endoftext|>", "\n\nUser:", "\n\nHuman:", }; static const size_t kMaxStopLen = []{ size_t m = 0; for (const auto& s : kStopSequences) m = std::max(m, s.size()); return m; }(); } // namespaceThe UTF-8 buffering and stop-sequence detection could be encapsulated in a small struct that both generation paths use.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp` around lines 983 - 1041, Extract the duplicated UTF-8 buffering and stop-sequence logic into a shared helper (e.g., a static anonymous-namespace constants kStopSequences and kMaxStopLen and a small Utf8StopDetector/Utf8StopBuffer struct) and replace the inline code in both the streaming loop and generate_stream with calls to that helper; the helper should own Utf8State, partial buffer, stop_window and expose methods to feed new bytes (from common_token_to_piece/new_token_chars), return any fully-valid UTF‑8 chunk, detect/trim stop sequences (mirroring the current found_stop_pos logic), and produce safe trimmed output for appending to generated_text so you can remove duplicated STOP_SEQUENCES, MAX_STOP_LEN, scanner_state, partial_utf8_buffer and stop_window usages in the loops.examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/ToolSettingsView.swift (1)
103-103: Remove debug print statement.This
os.Loggerfor debug builds only, or remove it entirely.🧹 Proposed fix
- print("Calculator received args: \(args), using expression: '\(expression)'")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/ToolSettingsView.swift` at line 103, Remove the debug print statement that logs "Calculator received args" in ToolSettingsView.swift; replace it either by removing the line entirely or by switching to a proper logger (os.Logger) and gating it behind a debug-only build check (e.g., `#if` DEBUG) so it does not emit to production consoles; locate the occurrence by searching for the string "Calculator received args" or the print(...) call and update the surrounding function accordingly.examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift (1)
446-450: Consider using named parameters for enum destructuring.The positional destructuring works correctly, but if
VoiceSessionEventassociated value order changes, this code would silently receive misaligned data. Using named parameters would be more resilient:🧹 Suggested improvement
- case .responded(let text, _): assistantResponse = text + case .responded(text: let text, thinkingContent: _): assistantResponse = text case .speaking: sessionState = .speaking; currentStatus = "Speaking..." - case let .turnCompleted(transcript, response, _, _): + case .turnCompleted(transcript: let transcript, response: let response, thinkingContent: _, audio: _): currentTranscript = transcript; assistantResponse = response🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift` around lines 446 - 450, The pattern match on VoiceSessionEvent uses positional associated values which is fragile; update the case patterns to use named parameters so each value is matched by label (e.g., change the case .turnCompleted(transcript, response, _, _) to a labeled form like case let .turnCompleted(transcript: transcript, response: response, ...)) and similarly label .responded and other cases if they have associated value labels; then assign currentTranscript, assistantResponse, sessionState, and currentStatus from those named bindings to ensure order changes in VoiceSessionEvent won’t break the logic.examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swift (1)
101-127: Prefer Swift-native notification observation here.This reintroduces selector-based observer management inside a Combine-backed
@MainActorview model. Using aNotificationCenterpublisher or async notifications would keep delivery/cancellation in the same Swift concurrency model and remove the manualremoveObserverpath.As per coding guidelines "Use the latest Swift 6 APIs always".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swift` around lines 101 - 127, The view model is using selector-based NotificationCenter observers (subscribeToModelNotifications, handleModelLoaded(_:), handleModelUnloaded, and deinit removeObserver) which reintroduces non-Swift-concurrency observer management; replace them with Swift-native observation by removing addObserver/removeObserver and instead subscribe via NotificationCenter publishers or async notifications and store the subscriptions in a Combine cancellables Set (e.g., a private var cancellables = Set<AnyCancellable>() or use Task/NotificationCenter.notifications(named:) with Task cancellation). Specifically, remove the selector-based registrations in subscribeToModelNotifications, replace them with NotificationCenter.default.publisher(for: Notification.Name("ModelLoaded")) and .sink to update loadedModelSupportsThinking (and a separate publisher for "ModelUnloaded" to set it false), store the AnyCancellable values in cancellables, and drop the manual removeObserver call in deinit so cancellations happen via the cancellables lifecycle (or cancel the Tasks in deinit if using async sequences).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel`+Generation.swift:
- Around line 25-29: The loop is issuing UI updates and notifications on every
token even when Self.stripThinkTags(from:) yields the same displayText; change
the streaming loop (where fullResponse, displayText are computed and
updateMessageContent(at:content:) and NotificationCenter.default.post are
called) to track the last-displayed text (e.g., a local prevDisplayText) and
only call updateMessageContent(at:content:) and post the scroll notification
when displayText != prevDisplayText, updating prevDisplayText after a successful
update; this avoids no-op main-thread churn while still showing visible changes.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Views/ChatInterfaceView.swift`:
- Around line 34-35: Remove the duplicate `@AppStorage`("thinkingModeEnabled")
property in ChatInterfaceView and use the shared
Settings/ToolSettingsViewModel's thinkingModeEnabled instead; replace any uses
or bindings of the local thinkingModeEnabled with
toolSettingsViewModel.thinkingModeEnabled (including the badge binding and other
occurrences mentioned around the 502-518 region) so toggles update
SettingsViewModel.shared and avoid the second source of truth.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/ToolSettingsView.swift`:
- Around line 122-128: The NSExpression(format:) call can raise an Objective-C
exception for malformed syntax (e.g., "((", "1 + + 2"), so replace the direct
NSExpression(format: cleanedExpression) usage with a safe evaluator: implement
an ObjC bridging helper (e.g., evaluateExpressionSafely(_:)) that wraps
NSExpression(format:) and expressionValue(with:context:) inside `@try/`@catch and
returns nil on exception; then call that helper from ToolSettingsView (use
cleanedExpression and expression variables) and only build the return dictionary
when the helper returns a valid NSNumber, otherwise handle it as an invalid
expression (e.g., skip returning a result or surface an error).
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Vision/VLMCameraView.swift`:
- Around line 56-63: The scenePhase handler can restart the camera even when
this view is offscreen—add a visibility guard: introduce a Bool property (e.g.,
isVisible) updated in onAppear and onDisappear (or via SwiftUI .task/.onChange
for view visibility), and in the .onChange(of: scenePhase) closure check
isVisible before calling setupCameraIfNeeded(); keep the existing calls to
viewModel.stopAutoStreaming() and viewModel.stopCamera() when moving to
background/inactive, but avoid calling setupCameraIfNeeded() on .active unless
isVisible is true.
In `@sdk/runanywhere-commons/src/features/llm/tool_calling.cpp`:
- Around line 718-722: The fallback branch currently always quotes the scalar in
flat_args, turning raw JSON literals into strings; update the logic in
tool_calling.cpp (the branch using kval_is_obj, kval, escaped_key,
escape_json_string, flat_args) to detect actual JSON literals and emit them
unquoted. Add or call a helper (e.g., is_json_literal(kval)) that returns true
for numbers, true/false, null, and for array/object text (or reuse kval_is_obj
for structures), and only run escape_json_string and wrap in quotes when the
value is a real string; otherwise append the raw kval as-is to preserve typed
literals and follow the structured-types guideline. Ensure the helper is
documented/implemented in the same translation unit and used in the same
conditional chain that currently handles kval and kval_is_obj.
- Around line 401-425: The scalar-value branch currently treats a leading '[' as
a scalar and stops at the first ',' or ']', truncating arrays; modify the
scanner that assigns *out_value and *out_is_object so that if json_obj[pos] ==
'[' you perform a matching-bracket scan (similar to how '{' is handled) to find
the closing ']' including nested brackets and strings, allocate the full
substring (including brackets), set *out_is_object = true (or the same
structured flag used for objects), and return true; otherwise keep the existing
scalar logic for true scalars. Ensure you use the same allocation/memory
conventions as the existing object path and reuse variable names json_obj, pos,
val_start/val_end, *out_value and *out_is_object to locate the change.
- Around line 705-711: The loop is currently marking is_tool_key true for any
alias in TOOL_NAME_KEYS and skipping all such keys, causing legitimate args like
"name" to be dropped; modify the logic in tool_calling.cpp (around the loop over
TOOL_NAME_KEYS) to record which specific alias matched (e.g., matched_key or
matched_index) when comparing k against TOOL_NAME_KEYS, then after the loop only
skip the map entry if k equals that matched key (or index matches matched_index)
instead of skipping whenever any alias exists; update any uses of is_tool_key to
use the matched_key check so only the actual tool-name key is excluded.
In `@sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/CppBridge.swift`:
- Around line 182-189: The shutdown() function currently spawns a
fire-and-forget Task which returns before native components finish tearing down;
change shutdown() to be async (remove the Task wrapper), sequentially await each
component's destroy() call (LLM.shared.destroy(), STT.shared.destroy(),
TTS.shared.destroy(), VAD.shared.destroy(), VoiceAgent.shared.destroy(),
VLM.shared.destroy()) so teardown completes before proceeding, and only after
all awaits clear _isInitialized and _servicesInitialized; then update the caller
RunAnywhere.reset() to await CppBridge.shutdown() and mark reset() async.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+TextGeneration.swift:
- Around line 100-110: The LLMGenerationResult currently misreports
thinkingTokens (non-streaming sets it to outputTokens when any thinkingContent
exists, while streaming returns 0), causing inconsistent and double-counted
metrics; change both code paths that construct LLMGenerationResult (the
non-streaming return here and the similar block around the 450-460 range) to set
thinkingTokens to 0 (and ensure responseTokens/outputTokens remain unchanged)
until the C++ layer exposes a real thinking-token metric, so both streaming and
non-streaming results return a consistent zero-valued thinkingTokens.
- Around line 312-335: The current ThinkingContentParser.extract(from:) only
removes the first <think>...</think> pair; update extract to scan the entire
input (e.g., with a while loop or regex) to find and remove all complete
<think>...</think> blocks, concatenating their inner contents (trimmed) into a
single thinking string, and rebuild response text from the remaining parts; also
strip any stray "<think>" or "</think>" leftovers after extraction so incomplete
fragments do not appear in the returned text. Ensure you modify
ThinkingContentParser.extract to collect all matches, remove them from the
source, and return (text: cleanedResponseText, thinking: combinedThinkingOrNil).
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/TTS/RunAnywhere`+TTS.swift:
- Around line 189-190: TTSStreamContext currently marked `@unchecked` Sendable
with an unprotected mutable totalData that is mutated from the C callback;
change TTSStreamContext to implement Sendable safely by adding an
OSAllocatedUnfairLock (matching the STT bridge pattern) and use it to guard all
accesses/mutations of totalData (wrap mutations in the C callback and the read
in the finalization path where
Unmanaged<TTSStreamContext>.fromOpaque(...).takeRetainedValue() is used). Mirror
the STT implementation (use the same lock property name and lock/unlock
placement) so totalData is only read/written while holding the
OSAllocatedUnfairLock.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift:
- Around line 224-229: The logic for adding the "/no_think" prefix in
RunAnywhere+VoiceSession.swift currently uses only config.thinkingModeEnabled;
align it with the pattern in RAGViewModel by ensuring the decision also
considers whether the model supports thinking (i.e., the same combined condition
used when building the config elsewhere). Update the construction or usage so
that effectivePrompt is prefixed with "/no_think" only when thinkingMode is
disabled AND the model supports thinking was considered when the config was
created (mirror the check used in RAGViewModel and VoiceAgentViewModel where
thinkingModeEnabled is set as settings.loadedModelSupportsThinking &&
settings.thinkingModeEnabled).
- Around line 258-263: The snippet currently emits .turnCompleted with variables
transcription, cleanedResponse, thinkingContent, and synthesizedAudio even when
an error occurs in the preceding do block; change the control flow so
.turnCompleted is only emitted when the do block completes successfully (i.e.,
move the emit(.turnCompleted(...)) into the success path of the do block or
guard for a successful result before emitting), and ensure the catch path either
emits an error-specific event (or returns/throws) instead of fallthrough
emission; reference the existing symbols transcription, cleanedResponse,
thinkingContent, synthesizedAudio and the emit(.turnCompleted(...)) call when
making the change.
---
Outside diff comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Views/ChatInterfaceView.swift`:
- Around line 417-421: The scrolling target "typing" disappears once the first
token arrives, so change scroll target logic used by
scrollToBottom(proxy:animated:) to choose a stable anchor: when
viewModel.isGenerating is true but viewModel.messages.last?.content.isEmpty ==
false, use the last message's id (e.g., viewModel.messages.last?.id) instead of
"typing", or create a persistent bottom anchor view ID (e.g., "bottom-anchor")
and always scroll to that; update the TypingIndicatorView id usage
(id("typing")) and the code that calls scrollToBottom(proxy:animated:) to select
the last message ID when content is non-empty or fall back to the persistent
"bottom-anchor" so scroll targets never point to a removed view.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+ToolCalling.swift:
- Around line 208-214: The follow-up prompt builder (buildFollowUpPrompt) is
embedding the original prompt which may contain the `/no_think` prefix in the
middle of the assembled message, so the C++ layer loses the required leading
`/no_think`. Fix by ensuring the leading `/no_think` is preserved: either strip
and capture the `/no_think` prefix from prompt into cleanPrompt before calling
buildFollowUpPrompt and pass cleanPrompt instead of prompt, or modify
buildFollowUpPrompt to detect and reapply the `/no_think` prefix at the very
start of the returned string; update the call that sets fullPrompt to pass
cleanPrompt (or reapply prefix) and keep references to buildFollowUpPrompt,
fullPrompt, prompt, and cleanPrompt consistent.
---
Nitpick comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Models/ModelListViewModel.swift`:
- Line 136: The early return in selectModel guarded by isLoadingModel makes taps
silently no-op; update selectModel (in ModelListViewModel) to give feedback by
either setting the same loading state used in loadModelsFromRegistry (e.g., set
isLoading = true / isLoadingModel = true at the start of selectModel) so the UI
shows a spinner, or log/emit a minimal message when isLoadingModel is true to
indicate a load is in progress; ensure you clear the loading flag on
completion/failure just like loadModelsFromRegistry to avoid leaving the view
stuck.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swift`:
- Around line 101-127: The view model is using selector-based NotificationCenter
observers (subscribeToModelNotifications, handleModelLoaded(_:),
handleModelUnloaded, and deinit removeObserver) which reintroduces
non-Swift-concurrency observer management; replace them with Swift-native
observation by removing addObserver/removeObserver and instead subscribe via
NotificationCenter publishers or async notifications and store the subscriptions
in a Combine cancellables Set (e.g., a private var cancellables =
Set<AnyCancellable>() or use Task/NotificationCenter.notifications(named:) with
Task cancellation). Specifically, remove the selector-based registrations in
subscribeToModelNotifications, replace them with
NotificationCenter.default.publisher(for: Notification.Name("ModelLoaded")) and
.sink to update loadedModelSupportsThinking (and a separate publisher for
"ModelUnloaded" to set it false), store the AnyCancellable values in
cancellables, and drop the manual removeObserver call in deinit so cancellations
happen via the cancellables lifecycle (or cancel the Tasks in deinit if using
async sequences).
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/ToolSettingsView.swift`:
- Line 103: Remove the debug print statement that logs "Calculator received
args" in ToolSettingsView.swift; replace it either by removing the line entirely
or by switching to a proper logger (os.Logger) and gating it behind a debug-only
build check (e.g., `#if` DEBUG) so it does not emit to production consoles; locate
the occurrence by searching for the string "Calculator received args" or the
print(...) call and update the surrounding function accordingly.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift`:
- Around line 446-450: The pattern match on VoiceSessionEvent uses positional
associated values which is fragile; update the case patterns to use named
parameters so each value is matched by label (e.g., change the case
.turnCompleted(transcript, response, _, _) to a labeled form like case let
.turnCompleted(transcript: transcript, response: response, ...)) and similarly
label .responded and other cases if they have associated value labels; then
assign currentTranscript, assistantResponse, sessionState, and currentStatus
from those named bindings to ensure order changes in VoiceSessionEvent won’t
break the logic.
In `@sdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cpp`:
- Around line 983-1041: Extract the duplicated UTF-8 buffering and stop-sequence
logic into a shared helper (e.g., a static anonymous-namespace constants
kStopSequences and kMaxStopLen and a small Utf8StopDetector/Utf8StopBuffer
struct) and replace the inline code in both the streaming loop and
generate_stream with calls to that helper; the helper should own Utf8State,
partial buffer, stop_window and expose methods to feed new bytes (from
common_token_to_piece/new_token_chars), return any fully-valid UTF‑8 chunk,
detect/trim stop sequences (mirroring the current found_stop_pos logic), and
produce safe trimmed output for appending to generated_text so you can remove
duplicated STOP_SEQUENCES, MAX_STOP_LEN, scanner_state, partial_utf8_buffer and
stop_window usages in the loops.
In `@sdk/runanywhere-commons/src/features/llm/llm_component.cpp`:
- Around line 772-788: The cancel implementation reads component->lifecycle
without holding the component mutex which can race with destruction/unload; to
fix, either document the threading constraint in the public header (declare that
rac_llm_component_cancel must not be called concurrently with
rac_llm_component_destroy or rac_llm_component_unload) or add a defensive
check/assert at the top of rac_llm_component_cancel (e.g.,
assert(component->lifecycle && "cannot cancel during destroy/unload") or return
an error if lifecycle is null) before calling rac_lifecycle_get_service and
rac_llm_cancel, referencing rac_llm_component_cancel, component->lifecycle,
rac_llm_component_destroy, rac_llm_component_unload and rac_llm_cancel.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f15418f-6046-4e3d-9175-cb636f3efb2f
📒 Files selected for processing (27)
examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel+Events.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel+Generation.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel+ModelManagement.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel+ToolCalling.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Views/ChatInterfaceView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Models/ModelListViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/ViewModels/RAGViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/CombinedSettingsView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/ToolSettingsView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Vision/VLMCameraView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAssistantView.swiftsdk/runanywhere-commons/src/backends/llamacpp/llamacpp_backend.cppsdk/runanywhere-commons/src/features/llm/llm_component.cppsdk/runanywhere-commons/src/features/llm/tool_calling.cppsdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/CppBridge.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Infrastructure/Download/Utilities/ArchiveUtility.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+TextGeneration.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere+ToolCalling.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/TTS/RunAnywhere+TTS.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceSession.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/VoiceAgentTypes.swift
| for try await token in stream { | ||
| fullResponse += token | ||
| await updateMessageContent(at: messageIndex, content: fullResponse) | ||
| let displayText = Self.stripThinkTags(from: fullResponse) | ||
| await updateMessageContent(at: messageIndex, content: displayText) | ||
| NotificationCenter.default.post( |
There was a problem hiding this comment.
Skip no-op UI updates while the model is still inside <think>.
stripThinkTags can keep displayText unchanged for many hidden tokens, but this loop still rebuilds the message and posts a scroll notification on every token. On long thinking traces that becomes avoidable main-thread churn.
Suggested simplification
func generateStreamingResponse(
prompt: String,
options: LLMGenerationOptions,
messageIndex: Int
) async throws {
var fullResponse = ""
+ var lastDisplayText = ""
let streamingResult = try await RunAnywhere.generateStream(prompt, options: options)
let stream = streamingResult.stream
let metricsTask = streamingResult.result
for try await token in stream {
fullResponse += token
let displayText = Self.stripThinkTags(from: fullResponse)
+ guard displayText != lastDisplayText else { continue }
+ lastDisplayText = displayText
await updateMessageContent(at: messageIndex, content: displayText)
NotificationCenter.default.post(
name: Notification.Name("MessageContentUpdated"),
object: nil
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for try await token in stream { | |
| fullResponse += token | |
| await updateMessageContent(at: messageIndex, content: fullResponse) | |
| let displayText = Self.stripThinkTags(from: fullResponse) | |
| await updateMessageContent(at: messageIndex, content: displayText) | |
| NotificationCenter.default.post( | |
| var fullResponse = "" | |
| var lastDisplayText = "" | |
| let streamingResult = try await RunAnywhere.generateStream(prompt, options: options) | |
| let stream = streamingResult.stream | |
| let metricsTask = streamingResult.result | |
| for try await token in stream { | |
| fullResponse += token | |
| let displayText = Self.stripThinkTags(from: fullResponse) | |
| guard displayText != lastDisplayText else { continue } | |
| lastDisplayText = displayText | |
| await updateMessageContent(at: messageIndex, content: displayText) | |
| NotificationCenter.default.post( | |
| name: Notification.Name("MessageContentUpdated"), | |
| object: nil | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/ViewModels/LLMViewModel`+Generation.swift
around lines 25 - 29, The loop is issuing UI updates and notifications on every
token even when Self.stripThinkTags(from:) yields the same displayText; change
the streaming loop (where fullResponse, displayText are computed and
updateMessageContent(at:content:) and NotificationCenter.default.post are
called) to track the last-displayed text (e.g., a local prevDisplayText) and
only call updateMessageContent(at:content:) and post the scroll notification
when displayText != prevDisplayText, updating prevDisplayText after a successful
update; this avoids no-op main-thread churn while still showing visible changes.
| @ObservedObject private var toolSettingsViewModel = ToolSettingsViewModel.shared | ||
| @AppStorage("thinkingModeEnabled") private var thinkingModeEnabled = false |
There was a problem hiding this comment.
Avoid a second source of truth for thinking mode.
SettingsViewModel already owns thinkingModeEnabled and persists it. Toggling a separate @AppStorage copy here can update UserDefaults without updating SettingsViewModel.shared.thinkingModeEnabled, so generation code that reads the shared settings can stay stale for the rest of the session. Bind the badge to the shared settings view model instead of duplicating the state.
Suggested direction
- `@ObservedObject` private var toolSettingsViewModel = ToolSettingsViewModel.shared
- `@AppStorage`("thinkingModeEnabled") private var thinkingModeEnabled = false
+ `@ObservedObject` private var toolSettingsViewModel = ToolSettingsViewModel.shared
+ `@ObservedObject` private var settingsViewModel = SettingsViewModel.shared
...
- if thinkingModeEnabled && viewModel.loadedModelSupportsThinking {
+ if settingsViewModel.thinkingModeEnabled && viewModel.loadedModelSupportsThinking {
thinkingModeBadge
}
...
- .padding(.top, ((thinkingModeEnabled && viewModel.loadedModelSupportsThinking) || viewModel.useToolCalling || !viewModel.loraAdapters.isEmpty || hasModelSelected) ? 8 : 0)
+ .padding(.top, ((settingsViewModel.thinkingModeEnabled && viewModel.loadedModelSupportsThinking) || viewModel.useToolCalling || !viewModel.loraAdapters.isEmpty || hasModelSelected) ? 8 : 0)
...
var thinkingModeBadge: some View {
Button {
- thinkingModeEnabled.toggle()
+ settingsViewModel.thinkingModeEnabled.toggle()
} label: {Also applies to: 502-518
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Views/ChatInterfaceView.swift`
around lines 34 - 35, Remove the duplicate `@AppStorage`("thinkingModeEnabled")
property in ChatInterfaceView and use the shared
Settings/ToolSettingsViewModel's thinkingModeEnabled instead; replace any uses
or bindings of the local thinkingModeEnabled with
toolSettingsViewModel.thinkingModeEnabled (including the badge binding and other
occurrences mentioned around the 502-518 region) so toggles update
SettingsViewModel.shared and avoid the second source of truth.
| let exp = NSExpression(format: cleanedExpression) | ||
| if let result = exp.expressionValue(with: nil, context: nil) as? NSNumber { | ||
| return [ | ||
| "result": .number(result.doubleValue), | ||
| "expression": .string(expression) | ||
| ] | ||
| } |
There was a problem hiding this comment.
NSExpression(format:) can crash on malformed expressions.
The character validation (lines 112-120) prevents injection but doesn't validate syntax. Expressions like "1 + + 2", "1 /", or "((" pass character validation but will crash NSExpression(format:).
🛡️ Proposed fix: wrap NSExpression in a safe evaluation
- let exp = NSExpression(format: cleanedExpression)
- if let result = exp.expressionValue(with: nil, context: nil) as? NSNumber {
- return [
- "result": .number(result.doubleValue),
- "expression": .string(expression)
- ]
+ do {
+ let exp = NSExpression(format: cleanedExpression)
+ if let result = exp.expressionValue(with: nil, context: nil) as? NSNumber {
+ return [
+ "result": .number(result.doubleValue),
+ "expression": .string(expression)
+ ]
+ }
+ } catch {
+ // NSExpression throws on invalid syntax
}Note: NSExpression(format:) doesn't throw but can raise an Objective-C exception. Consider using NSExpression(format:argumentArray:) with proper validation or a dedicated math parser library for robustness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/ToolSettingsView.swift`
around lines 122 - 128, The NSExpression(format:) call can raise an Objective-C
exception for malformed syntax (e.g., "((", "1 + + 2"), so replace the direct
NSExpression(format: cleanedExpression) usage with a safe evaluator: implement
an ObjC bridging helper (e.g., evaluateExpressionSafely(_:)) that wraps
NSExpression(format:) and expressionValue(with:context:) inside `@try/`@catch and
returns nil on exception; then call that helper from ToolSettingsView (use
cleanedExpression and expression variables) and only build the return dictionary
when the helper returns a valid NSNumber, otherwise handle it as an invalid
expression (e.g., skip returning a result or surface an error).
| .onChange(of: scenePhase) { _, newPhase in | ||
| if newPhase == .background || newPhase == .inactive { | ||
| viewModel.stopAutoStreaming() | ||
| viewModel.stopCamera() | ||
| } else if newPhase == .active { | ||
| setupCameraIfNeeded() | ||
| } | ||
| } |
There was a problem hiding this comment.
Potential camera restart when view is not visible.
When the user navigates away, .onDisappear stops the camera. However, if the view remains in the navigation stack's hierarchy, .onChange(of: scenePhase) may still fire when the app becomes active, causing setupCameraIfNeeded() to restart the camera even though the user is on a different screen.
Consider tracking visibility state to guard against this:
🛠️ Proposed fix using visibility tracking
`@State` private var showingPhotos = false
`@State` private var selectedPhoto: PhotosPickerItem?
+@State private var isViewVisible = false
`@Environment`(\.scenePhase) private var scenePhase-.onAppear { setupCameraIfNeeded() }
+.onAppear {
+ isViewVisible = true
+ setupCameraIfNeeded()
+}
.onDisappear {
+ isViewVisible = false
viewModel.stopAutoStreaming()
viewModel.stopCamera()
}
.onChange(of: scenePhase) { _, newPhase in
if newPhase == .background || newPhase == .inactive {
viewModel.stopAutoStreaming()
viewModel.stopCamera()
- } else if newPhase == .active {
+ } else if newPhase == .active && isViewVisible {
setupCameraIfNeeded()
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Vision/VLMCameraView.swift`
around lines 56 - 63, The scenePhase handler can restart the camera even when
this view is offscreen—add a visibility guard: introduce a Bool property (e.g.,
isVisible) updated in onAppear and onDisappear (or via SwiftUI .task/.onChange
for view visibility), and in the .onChange(of: scenePhase) closure check
isVisible before calling setupCameraIfNeeded(); keep the existing calls to
viewModel.stopAutoStreaming() and viewModel.stopCamera() when moving to
background/inactive, but avoid calling setupCameraIfNeeded() on .active unless
isVisible is true.
| } else { | ||
| // Scalar value (number, boolean, null) | ||
| // Read until comma, closing brace, or whitespace | ||
| size_t val_start = pos; | ||
| size_t val_end = pos; | ||
| while (val_end < len && json_obj[val_end] != ',' && | ||
| json_obj[val_end] != '}' && json_obj[val_end] != ']' && | ||
| json_obj[val_end] != '\n') { | ||
| val_end++; | ||
| } | ||
| // Trim trailing whitespace | ||
| while (val_end > val_start && | ||
| (json_obj[val_end - 1] == ' ' || json_obj[val_end - 1] == '\t')) { | ||
| val_end--; | ||
| } | ||
| if (val_end > val_start) { | ||
| size_t val_len = val_end - val_start; | ||
| *out_value = static_cast<char*>(malloc(val_len + 1)); | ||
| if (*out_value) { | ||
| memcpy(*out_value, json_obj + val_start, val_len); | ||
| (*out_value)[val_len] = '\0'; | ||
| } | ||
| *out_is_object = false; | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Array-valued arguments are truncated in the new scalar path.
Line 401 sends [ into the scalar branch, but that scanner stops on the first , or ]. Inputs like {"items":[1,2]} get extracted as "[1", so any RAC_TOOL_PARAM_ARRAY argument is corrupted before parsing completes. Handle [ as structured JSON with a matching-bracket scan instead of treating it as a scalar literal.
As per coding guidelines, sdk/runanywhere-commons/**/*.{c,cpp,h,hpp}: "Always use structured types, never use strings directly for consistency and scalability"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk/runanywhere-commons/src/features/llm/tool_calling.cpp` around lines 401 -
425, The scalar-value branch currently treats a leading '[' as a scalar and
stops at the first ',' or ']', truncating arrays; modify the scanner that
assigns *out_value and *out_is_object so that if json_obj[pos] == '[' you
perform a matching-bracket scan (similar to how '{' is handled) to find the
closing ']' including nested brackets and strings, allocate the full substring
(including brackets), set *out_is_object = true (or the same structured flag
used for objects), and return true; otherwise keep the existing scalar logic for
true scalars. Ensure you use the same allocation/memory conventions as the
existing object path and reuse variable names json_obj, pos, val_start/val_end,
*out_value and *out_is_object to locate the change.
| return LLMGenerationResult( | ||
| text: generatedText, | ||
| thinkingContent: nil, | ||
| thinkingContent: thinkingContent, | ||
| inputTokens: inputTokens, | ||
| tokensUsed: outputTokens, | ||
| modelUsed: modelId, | ||
| latencyMs: totalTimeMs, | ||
| framework: "llamacpp", | ||
| tokensPerSecond: tokensPerSecond, | ||
| timeToFirstTokenMs: nil, | ||
| thinkingTokens: 0, | ||
| thinkingTokens: thinkingContent.map { _ in outputTokens } ?? 0, |
There was a problem hiding this comment.
thinkingTokens is misreported and inconsistent across paths.
The non-streaming result uses the entire completion count as thinkingTokens whenever any <think> block exists, while the streaming result always returns 0. Because responseTokens is also set to outputTokens, this can double-count the same completion in analytics. Please return a consistent value in both paths until the C++ layer exposes a dedicated thinking-token metric.
Also applies to: 450-460
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+TextGeneration.swift
around lines 100 - 110, The LLMGenerationResult currently misreports
thinkingTokens (non-streaming sets it to outputTokens when any thinkingContent
exists, while streaming returns 0), causing inconsistent and double-counted
metrics; change both code paths that construct LLMGenerationResult (the
non-streaming return here and the similar block around the 450-460 range) to set
thinkingTokens to 0 (and ensure responseTokens/outputTokens remain unchanged)
until the C++ layer exposes a real thinking-token metric, so both streaming and
non-streaming results return a consistent zero-valued thinkingTokens.
| enum ThinkingContentParser { | ||
| /// Extracts `<think>...</think>` content from generated text. | ||
| /// - Returns: Tuple of (responseText, thinkingContent). If no tags found, responseText = original text, thinkingContent = nil. | ||
| static func extract(from text: String) -> (text: String, thinking: String?) { | ||
| guard let startRange = text.range(of: "<think>"), | ||
| let endRange = text.range(of: "</think>"), | ||
| startRange.upperBound <= endRange.lowerBound else { | ||
| return (text: text, thinking: nil) | ||
| } | ||
| let thinkingContent = String(text[startRange.upperBound..<endRange.lowerBound]) | ||
| .trimmingCharacters(in: .whitespacesAndNewlines) | ||
| // Include any text before <think> and after </think> | ||
| let textBefore = String(text[..<startRange.lowerBound]) | ||
| .trimmingCharacters(in: .whitespacesAndNewlines) | ||
| let textAfter = String(text[endRange.upperBound...]) | ||
| .trimmingCharacters(in: .whitespacesAndNewlines) | ||
| let responseText = [textBefore, textAfter] | ||
| .filter { !$0.isEmpty } | ||
| .joined(separator: "\n") | ||
| return ( | ||
| text: responseText, | ||
| thinking: thinkingContent.isEmpty ? nil : thinkingContent | ||
| ) | ||
| } |
There was a problem hiding this comment.
The parser only removes the first <think> block.
extract(from:) stops after one closing tag, so additional or trailing incomplete think sections remain in text. The app-side LLMViewModel.stripThinkTags already handles repeated/incomplete blocks, which means SDK results can still surface raw think markup in non-streaming and final streaming results. Please make this parser consume all think blocks or centralize one shared implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/LLM/RunAnywhere`+TextGeneration.swift
around lines 312 - 335, The current ThinkingContentParser.extract(from:) only
removes the first <think>...</think> pair; update extract to scan the entire
input (e.g., with a while loop or regex) to find and remove all complete
<think>...</think> blocks, concatenating their inner contents (trimmed) into a
single thinking string, and rebuild response text from the remaining parts; also
strip any stray "<think>" or "</think>" leftovers after extraction so incomplete
fragments do not appear in the returned text. Ensure you modify
ThinkingContentParser.extract to collect all matches, remove them from the
source, and return (text: cleanedResponseText, thinking: combinedThinkingOrNil).
| let finalContext = Unmanaged<TTSStreamContext>.fromOpaque(contextPtr).takeRetainedValue() | ||
| let totalAudioData = finalContext.totalData |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the native TTS streaming implementation for callback lifetime/threading.
fd 'rac_tts_component' -HI | while read -r file; do
rg -n -C5 'rac_tts_component_synthesize_stream|user_data|callback|dispatch_async|std::thread|async|queue|worker' "$file"
done
# Compare Swift-side retain/release patterns used by other streaming bridges.
rg -n -C4 'passRetained|takeRetainedValue|takeUnretainedValue' \
sdk/runanywhere-swift/Sources/RunAnywhere/Public/ExtensionsRepository: RunanywhereAI/runanywhere-sdks
Length of output: 21787
🏁 Script executed:
# Check if TTSStreamContext and STTStreamingContext have similar definitions and usage
rg -n -A 10 'class TTSStreamContext|struct TTSStreamContext' sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/TTS/
# Check STT context definition for comparison
rg -n -A 10 'class STTStreamingContext|struct STTStreamingContext' sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/STT/
# Check if there are any comments in TTS about synchronous behavior
rg -n -B 5 -A 5 'rac_tts_component_synthesize_stream' sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/TTS/RunAnywhere+TTS.swift
# Search for any TODOs or FIXMEs related to threading/concurrency in TTS
rg -n 'TODO|FIXME|SAFETY|thread|async|race' sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/TTS/RunAnywhere+TTS.swiftRepository: RunanywhereAI/runanywhere-sdks
Length of output: 3399
🏁 Script executed:
# Check how STT uses finalText in callbacks
rg -n -B 2 -A 2 'finalText' sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/STT/RunAnywhere+STT.swift | head -40
# Check if there are any comments explaining the synchronization choice in STT
rg -n -B 5 -A 0 'OSAllocatedUnfairLock' sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/STT/RunAnywhere+STT.swiftRepository: RunanywhereAI/runanywhere-sdks
Length of output: 1068
Protect totalData with a Swift 6 synchronization primitive; mirror the STT implementation.
Line 311 declares TTSStreamContext as @unchecked Sendable with an unguarded mutable totalData, which is mutated directly in the C callback at line 183. The sibling STT bridge (sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/STT/RunAnywhere+STT.swift:306-313) solves this correctly using OSAllocatedUnfairLock to protect its mutable state. Replace the @unchecked Sendable with a proper Sendable implementation using OSAllocatedUnfairLock for totalData to guard against concurrent callback access, consistent with Swift 6 APIs and the coding guidelines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/TTS/RunAnywhere`+TTS.swift
around lines 189 - 190, TTSStreamContext currently marked `@unchecked` Sendable
with an unprotected mutable totalData that is mutated from the C callback;
change TTSStreamContext to implement Sendable safely by adding an
OSAllocatedUnfairLock (matching the STT bridge pattern) and use it to guard all
accesses/mutations of totalData (wrap mutations in the C callback and the read
in the finalization path where
Unmanaged<TTSStreamContext>.fromOpaque(...).takeRetainedValue() is used). Mirror
the STT implementation (use the same lock property name and lock/unlock
placement) so totalData is only read/written while holding the
OSAllocatedUnfairLock.
| let effectivePrompt: String | ||
| if !config.thinkingModeEnabled { | ||
| effectivePrompt = "/no_think\n\(transcription)" | ||
| } else { | ||
| effectivePrompt = transcription | ||
| } |
There was a problem hiding this comment.
Inconsistent thinking mode logic compared to other files.
This code applies /no_think prefix based solely on config.thinkingModeEnabled, but RAGViewModel (line 112) checks both loadedModelSupportsThinking && !thinkingModeEnabled.
If a model doesn't support thinking at all, the prefix is unnecessary and might confuse it. Consider aligning with the pattern used elsewhere:
🔧 Suggested alignment
The VoiceSessionConfig would need to also carry whether the model supports thinking, or the check should happen at the config construction site (which VoiceAgentViewModel already does at line 392).
If the config is always constructed with proper logic (only setting thinkingModeEnabled=true when the model supports it), the current code is fine. The construction in VoiceAgentViewModel line 392 does handle this correctly:
thinkingModeEnabled: settings.loadedModelSupportsThinking && settings.thinkingModeEnabledSo this is safe, but the semantics are subtly different across files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift
around lines 224 - 229, The logic for adding the "/no_think" prefix in
RunAnywhere+VoiceSession.swift currently uses only config.thinkingModeEnabled;
align it with the pattern in RAGViewModel by ensuring the decision also
considers whether the model supports thinking (i.e., the same combined condition
used when building the config elsewhere). Update the construction or usage so
that effectivePrompt is prefixed with "/no_think" only when thinkingMode is
disabled AND the model supports thinking was considered when the config was
created (mirror the check used in RAGViewModel and VoiceAgentViewModel where
thinkingModeEnabled is set as settings.loadedModelSupportsThinking &&
settings.thinkingModeEnabled).
| emit(.turnCompleted( | ||
| transcript: transcription, | ||
| response: cleanedResponse, | ||
| thinkingContent: thinkingContent, | ||
| audio: synthesizedAudio | ||
| )) |
There was a problem hiding this comment.
turnCompleted emitted even after errors with potentially empty/stale values.
If an error occurs in the do block (lines 208-256), the code falls through to emit turnCompleted (lines 258-263) with whatever values transcription, cleanedResponse, thinkingContent, and synthesizedAudio held at the time of failure. This could emit partial results (e.g., transcription succeeded but generation failed).
Consider whether turnCompleted should be skipped on errors, or if partial results are intentionally desired.
🔧 Potential fix if partial results are not desired
} catch {
logger.error("Processing failed: \(error)")
emit(.error(error.localizedDescription))
+ // Skip turnCompleted on error - error event is sufficient
+ if config.continuousMode && isRunning {
+ try? await startListening()
+ }
+ return
}
emit(.turnCompleted(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift
around lines 258 - 263, The snippet currently emits .turnCompleted with
variables transcription, cleanedResponse, thinkingContent, and synthesizedAudio
even when an error occurs in the preceding do block; change the control flow so
.turnCompleted is only emitted when the do block completes successfully (i.e.,
move the emit(.turnCompleted(...)) into the success path of the do block or
guard for a successful result before emitting), and ensure the catch path either
emits an error-specific event (or returns/throws) instead of fallthrough
emission; reference the existing symbols transcription, cleanedResponse,
thinkingContent, synthesizedAudio and the emit(.turnCompleted(...)) call when
making the change.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swift (1)
67-67: Consider adding a comment explaining the early initialization.The
_ = SettingsViewModel.sharedforces the singleton to initialize beforeinitializeSDK()runs. This ensures theloadSettings(),setupObservers(), andsubscribeToModelNotifications()setup completes before views access the shared instance. The pattern is valid, but a brief comment would clarify intent for future maintainers.📝 Suggested clarification
.task { + // Ensure SettingsViewModel singleton is initialized before SDK setup + // (subscriptions and settings must be ready before views access it) _ = SettingsViewModel.shared logger.info("🏁 App launched, initializing SDK...")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swift` at line 67, Add a brief inline comment above the `_ = SettingsViewModel.shared` line explaining that this forces early initialization of the SettingsViewModel singleton so loadSettings(), setupObservers(), and subscribeToModelNotifications() run before initializeSDK() and any views access the shared instance; reference the SettingsViewModel.shared singleton and initializeSDK() in the comment to make the intent clear to future maintainers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swift`:
- Line 67: Add a brief inline comment above the `_ = SettingsViewModel.shared`
line explaining that this forces early initialization of the SettingsViewModel
singleton so loadSettings(), setupObservers(), and
subscribeToModelNotifications() run before initializeSDK() and any views access
the shared instance; reference the SettingsViewModel.shared singleton and
initializeSDK() in the comment to make the intent clear to future maintainers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5405eb26-148a-41f9-b588-910fe7bbf60d
📒 Files selected for processing (1)
examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swift
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/LoraExamplePrompts.swift (1)
30-33: Consider the implications of sensitive demo prompts.The example prompts for the abliterated adapter include potentially sensitive content (lock picking instructions). While this demonstrates the adapter's uncensored capabilities, consider whether these specific prompts are appropriate for a public demo application that may be used in presentations or shared screenshots (as seen in this PR's description).
A safer alternative might be prompts that test reduced restrictions without encouraging potentially harmful outputs:
- "Discuss the philosophical arguments for and against free speech"
- "What are some morally complex dilemmas in medical ethics?"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/LoraExamplePrompts.swift` around lines 30 - 33, Replace the sensitive demo prompt in LoraExamplePrompts.swift for the adapter key "qwen2.5-0.5b-abliterated-lora-f16.gguf": remove or replace the "Explain how lock picking works in detail" example with a non-harmful test prompt that still exercises reduced restrictions (e.g., "Discuss the philosophical arguments for and against free speech" or "What are some morally complex dilemmas in medical ethics?"); update the array for the "qwen2.5-0.5b-abliterated-lora-f16.gguf" entry to use the safer prompt(s) so public demos/screenshots do not display potentially dangerous instructions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/DemoLoRAAdapter.swift`:
- Around line 71-81: The LoraAdapterCatalogEntry for id "abliterated-lora"
currently sets fileSize to 0 which causes incorrect download progress and
storage calculations; update the fileSize property on that
LoraAdapterCatalogEntry (the instance with id "abliterated-lora" and filename
"qwen2.5-0.5b-abliterated-lora-f16.gguf") to the actual byte size 17_620_224 so
UI progress and storage estimates use the correct value.
---
Nitpick comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/LoraExamplePrompts.swift`:
- Around line 30-33: Replace the sensitive demo prompt in
LoraExamplePrompts.swift for the adapter key
"qwen2.5-0.5b-abliterated-lora-f16.gguf": remove or replace the "Explain how
lock picking works in detail" example with a non-harmful test prompt that still
exercises reduced restrictions (e.g., "Discuss the philosophical arguments for
and against free speech" or "What are some morally complex dilemmas in medical
ethics?"); update the array for the "qwen2.5-0.5b-abliterated-lora-f16.gguf"
entry to use the safer prompt(s) so public demos/screenshots do not display
potentially dangerous instructions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1474e7b1-f815-43c6-b174-58a3a0956914
📒 Files selected for processing (3)
examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/DemoLoRAAdapter.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/LoraExamplePrompts.swift
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/ios/RunAnywhereAI/RunAnywhereAI/App/RunAnywhereAIApp.swift
| // --- Abliterated adapter (uncensored fine-tune for Qwen 2.5 0.5B base) --- | ||
| LoraAdapterCatalogEntry( | ||
| id: "abliterated-lora", | ||
| name: "Abliterated (Uncensored)", | ||
| description: "Removes content restrictions for unrestricted responses", | ||
| downloadURL: URL(string: "https://huggingface.co/Void2377/qwen-lora-gguf/resolve/main/qwen2.5-0.5b-abliterated-lora-f16.gguf")!, | ||
| filename: "qwen2.5-0.5b-abliterated-lora-f16.gguf", | ||
| compatibleModelIds: ["qwen2.5-0.5b-base-q8_0"], | ||
| fileSize: 0, | ||
| defaultScale: 1.0 | ||
| ), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Fetch the actual file size of the abliterated adapter from HuggingFace
# Expected: Returns the Content-Length header showing the file size in bytes
curl -sI "https://huggingface.co/Void2377/qwen-lora-gguf/resolve/main/qwen2.5-0.5b-abliterated-lora-f16.gguf" | grep -i content-lengthRepository: RunanywhereAI/runanywhere-sdks
Length of output: 94
🏁 Script executed:
# First, let's check the actual current state of the file
cat -n examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/DemoLoRAAdapter.swift | sed -n '70,85p'Repository: RunanywhereAI/runanywhere-sdks
Length of output: 805
🏁 Script executed:
# The curl command may be returning redirect HTML. Let's try a different approach
# to get the actual binary file size using HuggingFace's API or checking headers differently
curl -sL -o /dev/null -w "Content-Length: %{size_download} bytes\n" "https://huggingface.co/Void2377/qwen-lora-gguf/resolve/main/qwen2.5-0.5b-abliterated-lora-f16.gguf"Repository: RunanywhereAI/runanywhere-sdks
Length of output: 104
Update fileSize to match actual HuggingFace file size.
The fileSize: 0 should be updated to 17_620_224 (the actual size in bytes). A zero file size can cause indeterminate progress indicators during download and inaccurate storage requirement estimates shown to users.
Proposed fix
LoraAdapterCatalogEntry(
id: "abliterated-lora",
name: "Abliterated (Uncensored)",
description: "Removes content restrictions for unrestricted responses",
downloadURL: URL(string: "https://huggingface.co/Void2377/qwen-lora-gguf/resolve/main/qwen2.5-0.5b-abliterated-lora-f16.gguf")!,
filename: "qwen2.5-0.5b-abliterated-lora-f16.gguf",
compatibleModelIds: ["qwen2.5-0.5b-base-q8_0"],
- fileSize: 0,
+ fileSize: 17_620_224,
defaultScale: 1.0
),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // --- Abliterated adapter (uncensored fine-tune for Qwen 2.5 0.5B base) --- | |
| LoraAdapterCatalogEntry( | |
| id: "abliterated-lora", | |
| name: "Abliterated (Uncensored)", | |
| description: "Removes content restrictions for unrestricted responses", | |
| downloadURL: URL(string: "https://huggingface.co/Void2377/qwen-lora-gguf/resolve/main/qwen2.5-0.5b-abliterated-lora-f16.gguf")!, | |
| filename: "qwen2.5-0.5b-abliterated-lora-f16.gguf", | |
| compatibleModelIds: ["qwen2.5-0.5b-base-q8_0"], | |
| fileSize: 0, | |
| defaultScale: 1.0 | |
| ), | |
| // --- Abliterated adapter (uncensored fine-tune for Qwen 2.5 0.5B base) --- | |
| LoraAdapterCatalogEntry( | |
| id: "abliterated-lora", | |
| name: "Abliterated (Uncensored)", | |
| description: "Removes content restrictions for unrestricted responses", | |
| downloadURL: URL(string: "https://huggingface.co/Void2377/qwen-lora-gguf/resolve/main/qwen2.5-0.5b-abliterated-lora-f16.gguf")!, | |
| filename: "qwen2.5-0.5b-abliterated-lora-f16.gguf", | |
| compatibleModelIds: ["qwen2.5-0.5b-base-q8_0"], | |
| fileSize: 17_620_224, | |
| defaultScale: 1.0 | |
| ), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Chat/Models/DemoLoRAAdapter.swift`
around lines 71 - 81, The LoraAdapterCatalogEntry for id "abliterated-lora"
currently sets fileSize to 0 which causes incorrect download progress and
storage calculations; update the fileSize property on that
LoraAdapterCatalogEntry (the instance with id "abliterated-lora" and filename
"qwen2.5-0.5b-abliterated-lora-f16.gguf") to the actual byte size 17_620_224 so
UI progress and storage estimates use the correct value.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceSession.swift (1)
259-269:⚠️ Potential issue | 🟠 MajorDo not emit
turnCompletedfrom the error path.The
catchstill falls through toemit(.turnCompleted(...)), so consumers receive a success-shaped event with partial or stale fields after STT/LLM/TTS fails. That is especially misleading becauseexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swifttreats.turnCompletedas the signal to go back to the ready state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift around lines 259 - 269, The current catch block logs and emits .error but execution still falls through to emit(.turnCompleted(...)), causing a success-shaped event after failures; update the control flow in RunAnywhere+VoiceSession (the catch handling around the try that produces transcription/cleanedResponse/thinkingContent/synthesizedAudio) so that when you call emit(.error(...)) you do not subsequently call emit(.turnCompleted(...)) — either move emit(.turnCompleted(...)) into the happy-path (inside the try after successful STT/LLM/TTS) or ensure the catch returns/throws/early-exits immediately after logger.error and emit(.error(...)) so .turnCompleted is never emitted on error.
🧹 Nitpick comments (2)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/Views/DocumentRAGView.swift (2)
617-642: Minor inconsistency in summary length logic.When there are 2+ sentences and the first sentence is > 20 chars, the entire first sentence is returned (could be 100+ chars). However, single-sentence text is truncated at 80 chars. This asymmetry may produce unexpectedly long summaries for multi-sentence content with a long first sentence.
Consider capping the first-sentence branch as well, or unifying the truncation approach:
♻️ Optional: Cap first sentence length
if sentences.count >= 2 { let firstSentence = sentences[0].trimmingCharacters(in: .whitespacesAndNewlines) - if firstSentence.count > 20 { - return firstSentence + "..." + if firstSentence.count > 20 && firstSentence.count <= 80 { + return firstSentence + "..." } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/Views/DocumentRAGView.swift` around lines 617 - 642, The thinkingSummary logic can return an arbitrarily long first sentence when there are 2+ sentences (via message.thinkingContent -> sentences -> firstSentence) but trims single-sentence text to 80 chars; unify this by capping the firstSentence branch to the same 80-char truncation rules: when sentences.count >= 2 compute firstSentence, then if firstSentence.count > 80 apply the existing truncation (take prefix(80), cut back to last space if present, append "…") else return firstSentence; keep the existing 80-char truncation logic for the full-thinking branch unchanged.
513-515: Consider simplifyinghasThinkingcomputed property.The nil check is redundant since optional chaining already handles it.
♻️ Simplified version
private var hasThinking: Bool { - message.thinkingContent != nil && !(message.thinkingContent?.isEmpty ?? true) + message.thinkingContent?.isEmpty == false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/Views/DocumentRAGView.swift` around lines 513 - 515, The computed property hasThinking is doing an unnecessary explicit nil check; simplify it by relying on optional chaining and a single boolean expression using message.thinkingContent (e.g., check message.thinkingContent?.isEmpty == false or use !(message.thinkingContent?.isEmpty ?? true)). Update the hasThinking getter to remove the redundant "!= nil" and only evaluate the optional's isEmpty result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swift`:
- Around line 97-114: In SettingsViewModel.init (and before
subscribeToModelNotifications is relied on), synchronously query the SDK for the
current active model/capability and set loadedModelSupportsThinking accordingly
so the flag reflects the present state; e.g. add a call in init that asks
RunAnywhere (or the SDK model API) for the currently loaded model/capability,
update loadedModelSupportsThinking on the main actor, then continue to call
subscribeToModelNotifications() to handle future events via handleSDKEvent;
ensure this check uses the same property/logic as handleSDKEvent so
VoiceAgentViewModel.startConversation sees the correct value even if a model was
loaded before SettingsViewModel was created.
In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Helpers/AdaptiveLayout.swift`:
- Around line 511-519: The current micContent uses
.onTapGesture/.onLongPressGesture on a plain view which strips Button
accessibility; instead wrap micContent in a Button (use the existing action as
the Button's primary action) and add the long-press behavior via
.simultaneousGesture(LongPressGesture(minimumDuration: 0.5).onEnded { _ in
(onLongPress ?? action)() }) or use .contextMenu if the long press shows
options, and also add an accessibilityAction for the long-press so
VoiceOver/Full Keyboard Access can invoke it; update the code paths that
reference micContent, onLongPress, and action accordingly.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift:
- Around line 123-125: resumeListening currently calls startListening() with
try? which both hides errors and causes startAudioLevelMonitoring (launched by
startListening) to be started repeatedly; update resumeListening (and related
state) to first cancel any existing audio-level monitoring task (e.g., cancel
audioLevelMonitoringTask or equivalent) or guard against duplicate monitoring,
then call await startListening() without try? so errors are propagated to the
caller; alternatively adjust startListening/startAudioLevelMonitoring to be
idempotent (check a monitoringTask flag) so repeated calls don't spawn
additional monitoring loops—use the concrete symbols resumeListening,
startListening, startAudioLevelMonitoring, isRunning and whichever variable
holds the monitoring Task to implement cancellation/guarding and remove the try?
suppression.
---
Duplicate comments:
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift:
- Around line 259-269: The current catch block logs and emits .error but
execution still falls through to emit(.turnCompleted(...)), causing a
success-shaped event after failures; update the control flow in
RunAnywhere+VoiceSession (the catch handling around the try that produces
transcription/cleanedResponse/thinkingContent/synthesizedAudio) so that when you
call emit(.error(...)) you do not subsequently call emit(.turnCompleted(...)) —
either move emit(.turnCompleted(...)) into the happy-path (inside the try after
successful STT/LLM/TTS) or ensure the catch returns/throws/early-exits
immediately after logger.error and emit(.error(...)) so .turnCompleted is never
emitted on error.
---
Nitpick comments:
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/Views/DocumentRAGView.swift`:
- Around line 617-642: The thinkingSummary logic can return an arbitrarily long
first sentence when there are 2+ sentences (via message.thinkingContent ->
sentences -> firstSentence) but trims single-sentence text to 80 chars; unify
this by capping the firstSentence branch to the same 80-char truncation rules:
when sentences.count >= 2 compute firstSentence, then if firstSentence.count >
80 apply the existing truncation (take prefix(80), cut back to last space if
present, append "…") else return firstSentence; keep the existing 80-char
truncation logic for the full-thinking branch unchanged.
- Around line 513-515: The computed property hasThinking is doing an unnecessary
explicit nil check; simplify it by relying on optional chaining and a single
boolean expression using message.thinkingContent (e.g., check
message.thinkingContent?.isEmpty == false or use
!(message.thinkingContent?.isEmpty ?? true)). Update the hasThinking getter to
remove the redundant "!= nil" and only evaluate the optional's isEmpty result.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d42810eb-8808-4ba3-b051-bcfa7dca7db1
📒 Files selected for processing (7)
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/ViewModels/RAGViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/RAG/Views/DocumentRAGView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAssistantView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Helpers/AdaptiveLayout.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceSession.swift
| init() { | ||
| loadSettings() | ||
| setupObservers() | ||
| subscribeToModelNotifications() | ||
| } | ||
|
|
||
| private func subscribeToModelNotifications() { | ||
| // Subscribe to SDK events directly so any LLM model load | ||
| // (from chat, voice agent, or RAG) updates the thinking mode flag. | ||
| RunAnywhere.events.events | ||
| .receive(on: DispatchQueue.main) | ||
| .sink { [weak self] event in | ||
| Task { @MainActor in | ||
| self?.handleSDKEvent(event) | ||
| } | ||
| } | ||
| .store(in: &cancellables) | ||
| } |
There was a problem hiding this comment.
Sync the current LLM capability before relying on notifications.
This only subscribes to future load/unload events. If SettingsViewModel.shared is first instantiated after an LLM is already loaded, loadedModelSupportsThinking stays false, and VoiceAgentViewModel.startConversation() then disables thinking mode for that session even when the active model supports it. Prime this flag from the current SDK/model state during initialization, then use notifications for updates.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Settings/SettingsViewModel.swift`
around lines 97 - 114, In SettingsViewModel.init (and before
subscribeToModelNotifications is relied on), synchronously query the SDK for the
current active model/capability and set loadedModelSupportsThinking accordingly
so the flag reflects the present state; e.g. add a call in init that asks
RunAnywhere (or the SDK model API) for the currently loaded model/capability,
update loadedModelSupportsThinking on the main actor, then continue to call
subscribeToModelNotifications() to handle future events via handleSDKEvent;
ensure this check uses the same property/logic as handleSDKEvent so
VoiceAgentViewModel.startConversation sees the correct value even if a model was
loaded before SettingsViewModel was created.
| if #available(iOS 26.0, macOS 26.0, *) { | ||
| Button(action: action) { | ||
| ZStack { | ||
| // Background circle | ||
| Circle() | ||
| .fill(isActive ? activeColor : inactiveColor) | ||
| .frame(width: AdaptiveSizing.micButtonSize, height: AdaptiveSizing.micButtonSize) | ||
|
|
||
| // Pulsing effect when active | ||
| if isPulsing { | ||
| Circle() | ||
| .stroke(Color.white.opacity(0.4), lineWidth: 2) | ||
| .frame(width: AdaptiveSizing.micButtonSize, height: AdaptiveSizing.micButtonSize) | ||
| .scaleEffect(1.3) | ||
| .opacity(0) | ||
| .animation( | ||
| .easeOut(duration: 1.0).repeatForever(autoreverses: false), | ||
| value: isPulsing | ||
| ) | ||
| } | ||
|
|
||
| // Icon or loading indicator | ||
| if isLoading { | ||
| ProgressView() | ||
| .progressViewStyle(CircularProgressViewStyle(tint: .white)) | ||
| .scaleEffect(1.2) | ||
| } else { | ||
| Image(systemName: icon) | ||
| .font(.system(size: AdaptiveSizing.micIconSize)) | ||
| .foregroundColor(.white) | ||
| .contentTransition(.symbolEffect(.replace)) | ||
| .animation(.smooth(duration: 0.3), value: icon) | ||
| } | ||
| } | ||
| } | ||
| .buttonStyle(.plain) | ||
| .glassEffect(.regular.interactive()) | ||
| micContent | ||
| .onLongPressGesture(minimumDuration: 0.5, perform: { onLongPress?() ?? action() }) | ||
| .onTapGesture(perform: action) | ||
| .glassEffect(.regular.interactive()) | ||
| } else { | ||
| Button(action: action) { | ||
| ZStack { | ||
| // Background circle | ||
| Circle() | ||
| .fill(isActive ? activeColor : inactiveColor) | ||
| .frame(width: AdaptiveSizing.micButtonSize, height: AdaptiveSizing.micButtonSize) | ||
|
|
||
| // Pulsing effect when active | ||
| if isPulsing { | ||
| Circle() | ||
| .stroke(Color.white.opacity(0.4), lineWidth: 2) | ||
| .frame(width: AdaptiveSizing.micButtonSize, height: AdaptiveSizing.micButtonSize) | ||
| .scaleEffect(1.3) | ||
| .opacity(0) | ||
| .animation( | ||
| .easeOut(duration: 1.0).repeatForever(autoreverses: false), | ||
| value: isPulsing | ||
| ) | ||
| } | ||
|
|
||
| // Icon or loading indicator | ||
| if isLoading { | ||
| ProgressView() | ||
| .progressViewStyle(CircularProgressViewStyle(tint: .white)) | ||
| .scaleEffect(1.2) | ||
| } else { | ||
| Image(systemName: icon) | ||
| .font(.system(size: AdaptiveSizing.micIconSize)) | ||
| .foregroundColor(.white) | ||
| .contentTransition(.symbolEffect(.replace)) | ||
| .animation(.smooth(duration: 0.3), value: icon) | ||
| } | ||
| } | ||
| } | ||
| .buttonStyle(.plain) | ||
| micContent | ||
| .onLongPressGesture(minimumDuration: 0.5, perform: { onLongPress?() ?? action() }) | ||
| .onTapGesture(perform: action) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does a SwiftUI view using .onTapGesture / .onLongPressGesture provide the same accessibility button traits and keyboard activation as Button, and what is the recommended way to add a long-press action while preserving Button semantics?
💡 Result:
No.
- A view that you make tappable with
.onTapGesture/.onLongPressGestureis still just “a view with gestures”; it doesn’t automatically become an accessible control with Button semantics (button trait, default “activate” action, keyboard/focus activation behavior, etc.). This is commonly called out as something you “lose” vs. usingButton. [2] Some tooling even flags.onTapGestureas an accessibility issue unless you explicitly add.isButton/.isLink. [3] - By contrast, SwiftUI
Buttonis treated as a button by accessibility by default (it carries the button trait, which is why Apple advises you not to put “button” in the accessibility label). [1], [4]
Recommended way to add a long-press action and keep Button semantics
Keep the primary interaction as a real Button, then layer a long-press interaction on top:
Option A (best when long-press is “more actions”): use contextMenu on the Button
- iOS’s standard long-press pattern is a context menu, and Apple explicitly recommends that if you “overload a long press” / provide a context menu, you should also expose equivalent actions via accessibility (e.g., VoiceOver actions rotor / keyboard command). [5]
Option B (when long-press is a distinct secondary action): attach a long-press gesture to the Button
- e.g.
Button { primary() } label: { … } .simultaneousGesture(LongPressGesture().onEnded { _ in secondary() })(a common SwiftUI approach). [6] - If that secondary action matters for accessibility, also expose it as an accessibility action (so it’s discoverable/triggerable without performing a long-press). [5]
🏁 Script executed:
cd examples/ios/RunAnywhereAI/RunAnywhereAI/Helpers && wc -l AdaptiveLayout.swiftRepository: RunanywhereAI/runanywhere-sdks
Length of output: 1911
🏁 Script executed:
cd examples/ios/RunAnywhereAI/RunAnywhereAI/Helpers && sed -n '490,530p' AdaptiveLayout.swiftRepository: RunanywhereAI/runanywhere-sdks
Length of output: 1911
Use Button with layered long-press gesture to preserve accessibility.
Replacing a Button with .onTapGesture / .onLongPressGesture on a plain view removes the automatic button accessibility traits, keyboard activation, and VoiceOver support. This makes the primary mic control inaccessible to users with VoiceOver and Full Keyboard Access.
Keep Button as the base and add long-press behavior on top using one of these approaches:
- Use
contextMenuon the Button if long-press shows additional options (iOS standard pattern) - Use
.simultaneousGesture(LongPressGesture())on the Button if long-press is a distinct secondary action
If the long-press action is important, also expose it as an accessibility action so it's discoverable and triggerable without requiring a gesture.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ios/RunAnywhereAI/RunAnywhereAI/Helpers/AdaptiveLayout.swift` around
lines 511 - 519, The current micContent uses .onTapGesture/.onLongPressGesture
on a plain view which strips Button accessibility; instead wrap micContent in a
Button (use the existing action as the Button's primary action) and add the
long-press behavior via .simultaneousGesture(LongPressGesture(minimumDuration:
0.5).onEnded { _ in (onLongPress ?? action)() }) or use .contextMenu if the long
press shows options, and also add an accessibilityAction for the long-press so
VoiceOver/Full Keyboard Access can invoke it; update the code paths that
reference micContent, onLongPress, and action accordingly.
| public func resumeListening() async { | ||
| guard isRunning else { return } | ||
| try? await startListening() |
There was a problem hiding this comment.
resumeListening() should not silently stack another monitor loop.
startListening() always launches startAudioLevelMonitoring() again (Lines 146-160), and none of those tasks are cancelled while isRunning stays true. In push-to-talk mode, every resume adds another .listening loop; try? also hides a failed restart from the caller. Track/cancel the monitoring task before restarting and surface failures from this API.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceSession.swift
around lines 123 - 125, resumeListening currently calls startListening() with
try? which both hides errors and causes startAudioLevelMonitoring (launched by
startListening) to be started repeatedly; update resumeListening (and related
state) to first cancel any existing audio-level monitoring task (e.g., cancel
audioLevelMonitoringTask or equivalent) or guard against duplicate monitoring,
then call await startListening() without try? so errors are propagated to the
caller; alternatively adjust startListening/startAudioLevelMonitoring to be
idempotent (check a monitoringTask flag) so repeated calls don't spawn
additional monitoring loops—use the concrete symbols resumeListening,
startListening, startAudioLevelMonitoring, isRunning and whichever variable
holds the monitoring Task to implement cancellation/guarding and remove the try?
suppression.
will post ss of testing in a bit







Summary by CodeRabbit
New Features
Improvements
Bug Fixes